59 research outputs found
Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition
A key challenge in fine-grained recognition is how to find and represent
discriminative local regions. Recent attention models are capable of learning
discriminative region localizers only from category labels with reinforcement
learning. However, not utilizing any explicit part information, they are not
able to accurately find multiple distinctive regions. In this work, we
introduce an attribute-guided attention localization scheme where the local
region localizers are learned under the guidance of part attribute
descriptions. By designing a novel reward strategy, we are able to learn to
locate regions that are spatially and semantically distinctive with
reinforcement learning algorithm. The attribute labeling requirement of the
scheme is more amenable than the accurate part location annotation required by
traditional part-based fine-grained recognition methods. Experimental results
on the CUB-200-2011 dataset demonstrate the superiority of the proposed scheme
on both fine-grained recognition and attribute recognition
WordSup: Exploiting Word Annotations for Character based Text Detection
Imagery texts are usually organized as a hierarchy of several visual
elements, i.e. characters, words, text lines and text blocks. Among these
elements, character is the most basic one for various languages such as
Western, Chinese, Japanese, mathematical expression and etc. It is natural and
convenient to construct a common text detection engine based on character
detectors. However, training character detectors requires a vast of location
annotated characters, which are expensive to obtain. Actually, the existing
real text datasets are mostly annotated in word or line level. To remedy this
dilemma, we propose a weakly supervised framework that can utilize word
annotations, either in tight quadrangles or the more loose bounding boxes, for
character detector training. When applied in scene text detection, we are thus
able to train a robust character detector by exploiting word annotations in the
rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text. The
character detector acts as a key role in the pipeline of our text detection
engine. It achieves the state-of-the-art performance on several challenging
scene text detection benchmarks. We also demonstrate the flexibility of our
pipeline by various scenarios, including deformed text detection and math
expression recognition.Comment: 2017 International Conference on Computer Visio
- …